Collin County
- North America > United States > Texas > Collin County > Plano (0.05)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (0.95)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.72)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Communications > Social Media (0.73)
- Information Technology > Communications > Mobile (0.48)
Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning
Lee, Kevin, Spiewak, Russell, Walsh, James
Scientific reasoning through Large Language Models in heliophysics involves more than just recalling facts: it requires incorporating physical assumptions, maintaining consistent units, and providing clear scientific formats through coordinated approaches. To address these challenges, we present Reasoning With a Star, a newly contributed heliophysics dataset applicable to reasoning; we also provide an initial benchmarking approach. Our data are constructed from National Aeronautics and Space Administration & University Corporation for Atmospheric Research Living With a Star summer school problem sets and compiled into a readily consumable question-and-answer structure with question contexts, reasoning steps, expected answer type, ground-truth targets, format hints, and metadata. A programmatic grader checks the predictions using unit-aware numerical tolerance, symbolic equivalence, and schema validation. We benchmark a single-shot baseline and four multi-agent patterns, finding that decomposing workflows through systems engineering principles outperforms direct prompting on problems requiring deductive reasoning rather than pure inductive recall.
- North America > United States > Texas > Collin County > Frisco (0.40)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (2 more...)
- Workflow (1.00)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (0.34)
- Energy (0.46)
- Aerospace & Defense (0.46)
- Government > Space Agency (0.34)
- Government > Regional Government > North America Government > United States Government (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
6d0f9c415e2d779c78f32b74668e9d02-Paper-Datasets_and_Benchmarks_Track.pdf
Fact-checking is extensively studied in the context of misinformation and disinformation, addressing objective inaccuracies. However, a softer form of misinformation involves responses that are factually correct but lack certain features such as clarity and relevance. This challenge is prevalent in formal Question-Answer (QA) settings such as press conferences in finance, politics, sports, and other domains, where subjective answers can obscure transparency. Despite this, there is a lack of manually annotated datasets for subjective features across multiple dimensions. To address this gap, we introduce SubjECTive-QA, a human annotated dataset on Earnings Call Transcripts' (ECTs) QA sessions as the answers given by company representatives are often open to subjective interpretations and scrutiny. The dataset includes 49, 446 annotations for long-form QA pairs across six features: Assertive, Cautious, Optimistic, Specific, Clear, and Relevant . These features are carefully selected to encompass the key attributes that reflect the tone of the answers provided during QA sessions across different domains. Our findings are that the best-performing Pre-trained Language Model (PLM), RoBERTa-base, has similar weighted F1 scores to Llama-3-70b-Chat on features with lower subjectivity, such as Relevant and Clear, with a mean difference of 2 .
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- Asia > India > Maharashtra > Mumbai (0.05)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (15 more...)
- Financial News (1.00)
- Research Report > New Finding (0.87)
- Media > News (1.00)
- Law (1.00)
- Banking & Finance > Trading (1.00)
- (2 more...)
An Evaluation Framework for Network IDS/IPS Datasets: Leveraging MITRE ATT&CK and Industry Relevance Metrics
Tori, Adrita Rahman, Hasan, Khondokar Fida
The performance of Machine Learning (ML) and Deep Learning (DL)-based Intrusion Detection and Prevention Systems (IDS/IPS) is critically dependent on the relevance and quality of the datasets used for training and evaluation. However, current AI model evaluation practices for developing IDS/IPS focus predominantly on accuracy metrics, often overlooking whether datasets represent industry-specific threats. To address this gap, we introduce a novel multi-dimensional framework that integrates the MITRE ATT&CK knowledge base for threat intelligence and employs five complementary metrics that together provide a comprehensive assessment of dataset suitability. Methodologically, this framework combines threat intelligence, natural language processing, and quantitative analysis to assess the suitability of datasets for specific industry contexts. Applying this framework to nine publicly available IDS/IPS datasets reveals significant gaps in threat coverage, particularly in the healthcare, energy, and financial sectors. In particular, recent datasets (e.g., CIC-IoMT, CIC-UNSW-NB15) align better with sector-specific threats, whereas others, like CICIoV-24, underperform despite their recency. Our findings provide a standardized, interpretable approach for selecting datasets aligned with sector-specific operational requirements, ultimately enhancing the real-world effectiveness of AI-driven IDS/IPS deployments. The efficiency and practicality of the framework are validated through deployment in a real-world case study, underscoring its capacity to inform dataset selection and enhance the effectiveness of AI-driven IDS/IPS in operational environments.
- Europe > Switzerland > Basel-City > Basel (0.04)
- Oceania > Australia > New South Wales (0.04)
- North America > United States > Texas > Collin County > Frisco (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Telecommunications > Networks (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- (7 more...)
Measuring Algorithmic Partisanship via Zero-Shot Classification and Its Implications on Political Discourse
Amidst the rapid normalization of generative artificial intelligence (GAI), intelligent systems have come to dominate political discourse across information media. However, internalized political biases stemming from training data skews, human prejudice, and algorithmic flaws continue to plague this novel technology. This study employs a zero-shot classification approach to evaluate algorithmic political partisanship through a methodical combination of ideological alignment, topicality, response sentiment, and objectivity. A total of 1800 model responses across six mainstream large language models (LLMs) were individually input into four distinct fine-tuned classification algorithms, each responsible for computing one of the aforementioned metrics. The results show an amplified liberal-authoritarian alignment across the six LLMs evaluated, with notable instances of reasoning supersessions and canned refusals. The study subsequently highlights the psychological influences underpinning human-computer interactions and how intrinsic biases can permeate public discourse. The resulting distortion of the political landscape can ultimately manifest as conformity or polarization, depending on the region's pre-existing socio-political structures.
- Asia > China (0.16)
- North America > United States > California > Orange County > Fullerton (0.04)
- Europe > Russia (0.04)
- (12 more...)
6d0f9c415e2d779c78f32b74668e9d02-Paper-Datasets_and_Benchmarks_Track.pdf
Fact-checking is extensively studied in the context of misinformation and disinformation, addressing objective inaccuracies. However, a softer form of misinformation involves responses that are factually correct but lack certain features such as clarity and relevance. This challenge is prevalent in formal Question-Answer (QA) settings such as press conferences in finance, politics, sports, and other domains, where subjective answers can obscure transparency. Despite this, there is a lack of manually annotated datasets for subjective features across multiple dimensions. To address this gap, we introduce SubjECTive-QA, a human annotated dataset on Earnings Call Transcripts' (ECTs) QA sessions as the answers given by company representatives are often open to subjective interpretations and scrutiny. The dataset includes 49, 446 annotations for long-form QA pairs across six features: Assertive, Cautious, Optimistic, Specific, Clear, and Relevant . These features are carefully selected to encompass the key attributes that reflect the tone of the answers provided during QA sessions across different domains. Our findings are that the best-performing Pre-trained Language Model (PLM), RoBERTa-base, has similar weighted F1 scores to Llama-3-70b-Chat on features with lower subjectivity, such as Relevant and Clear, with a mean difference of 2 .
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- Asia > India > Maharashtra > Mumbai (0.05)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (15 more...)
- Financial News (1.00)
- Research Report > New Finding (0.87)
- Media > News (1.00)
- Law (1.00)
- Banking & Finance > Trading (1.00)
- (3 more...)
Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation
Lu, Yen-Ju, Thebaud, Thomas, Moro-Velazquez, Laureano, Dehak, Najim, Villalba, Jesus
We present Paired by the Teacher (PbT), a two-stage teacher-student pipeline that synthesizes accurate input-output pairs without human labels or parallel data. In many low-resource natural language generation (NLG) scenarios, practitioners may have only raw outputs, like highlights, recaps, or questions, or only raw inputs, such as articles, dialogues, or paragraphs, but seldom both. This mismatch forces small models to learn from very few examples or rely on costly, broad-scope synthetic examples produced by large LLMs. PbT addresses this by asking a teacher LLM to compress each unpaired example into a concise intermediate representation (IR), and training a student to reconstruct inputs from IRs. This enables outputs to be paired with student-generated inputs, yielding high-quality synthetic data. We evaluate PbT on five benchmarks-document summarization (XSum, CNNDM), dialogue summarization (SAMSum, DialogSum), and question generation (SQuAD)-as well as an unpaired setting on SwitchBoard (paired with DialogSum summaries). An 8B student trained only on PbT data outperforms models trained on 70 B teacher-generated corpora and other unsupervised baselines, coming within 1.2 ROUGE-L of human-annotated pairs and closing 82% of the oracle gap at one-third the annotation cost of direct synthesis. Human evaluation on SwitchBoard further confirms that only PbT produces concise, faithful summaries aligned with the target style, highlighting its advantage of generating in-domain sources that avoid the mismatch, limiting direct synthesis.
- North America > Canada (0.14)
- North America > United States > Texas > Collin County > Plano (0.04)
- North America > United States > Texas > Hidalgo County > Donna (0.04)
- (6 more...)
Comparative Analysis of FOLD-SE vs. FOLD-R++ in Binary Classification and XGBoost in Multi-Category Classification
Murthy, Akshay, Sebastian, Shawn, Shangle, Manil, Wang, Huaduo, Dasgupta, Sopam, Gupta, Gopal
Recently, the demand for Machine Learning (ML) models that can balance accuracy, efficiency, and interpreability has grown significantly. Traditionally, there has been a tradeoff between accuracy and explainability in predictive models, with models such as Neural Networks achieving high accuracy on complex datasets while sacrificing internal transparency. As such, new rule-based algorithms such as FOLD-SE have been developed that provide tangible justification for predictions in the form of interpretable rule sets. The primary objective of this study was to compare FOLD-SE and FOLD-R++, both rule-based classifiers, in binary classification and evaluate how FOLD-SE performs against XGBoost, a widely used ensemble classifier, when applied to multi-category classification. We hypothesized that because FOLD-SE can generate a condensed rule set in a more explainable manner, it would lose upwards of an average of 3 percent in accuracy and F1 score when compared with XGBoost and FOLD-R++ in multiclass and binary classification, respectively. The research used data collections for classification, with accuracy, F1 scores, and processing time as the primary performance measures. Outcomes show that FOLD-SE is superior to FOLD-R++ in terms of binary classification by offering fewer rules but losing a minor percentage of accuracy and efficiency in processing time; in tasks that involve multi-category classifications, FOLD-SE is more precise and far more efficient compared to XGBoost, in addition to generating a comprehensible rule set. The results point out that FOLD-SE is a better choice for both binary tasks and classifications with multiple categories. Therefore, these results demonstrate that rule-based approaches like FOLD-SE can bridge the gap between explainability and performance, highlighting their potential as viable alternatives to black-box models in diverse classification tasks.
- North America > United States > Wisconsin (0.05)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- North America > United States > Texas > Collin County > Frisco (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area (0.50)
- Banking & Finance (0.47)
Adaptive Monitoring and Real-World Evaluation of Agentic AI Systems
Agentic artificial intelligence (AI) -- multi-agent systems that combine large language models with external tools and autonomous planning -- are rapidly transitioning from research laboratories into high-stakes domains. Our earlier "Basic" paper introduced a five-axis framework and proposed preliminary metrics such as goal drift and harm reduction but did not provide an algorithmic instantiation or empirical evidence. This "Advanced" sequel fills that gap. First, we revisit recent benchmarks and industrial deployments to show that technical metrics still dominate evaluations: a systematic review of 84 papers from 2023--2025 found that 83% report capability metrics while only 30% consider human-centred or economic axes [2]. Second, we formalise an Adaptive Multi-Dimensional Monitoring (AMDM) algorithm that normalises heterogeneous metrics, applies per-axis exponentially weighted moving-average thresholds and performs joint anomaly detection via the Mahalanobis distance [7]. Third, we conduct simulations and real-world experiments. AMDM cuts anomaly-detection latency from 12.3 s to 5.6 s on simulated goal drift and reduces false-positive rates from 4.5% to 0.9% compared with static thresholds. We present a comparison table and ROC/PR curves, and we reanalyse case studies to surface missing metrics. Code, data and a reproducibility checklist accompany this paper to facilitate replication. The code supporting this work is available at https://github.com/Manishms18/Adaptive-Multi-Dimensional-Monitoring.
DoorDash plans to test drone deliveries in San Francisco warehouse
Things to Do in L.A. Tap to enable a layout that focuses on the article. Masslie Arias, of DoorDash, prepares to load a delivery package on a hovering drone on July 31 in Frisco, Texas. This is read by an automated voice. Please report any issues or inconsistencies here . Food delivery app DoorDash is setting its sights on a new destination to test out flying drone deliveries: San Francisco.
- North America > United States > California > San Francisco County > San Francisco (0.65)
- North America > United States > Texas > Collin County > Frisco (0.25)
- North America > United States > California > Los Angeles County > Los Angeles (0.08)
- (12 more...)